Rokid collaborates with top AI firms to develop next-gen smart glasses, integrating generative AI and AI agents for a revolutionary OS and interface. Featuring on-device multimodal models, it supports voice, vision, and touch interactions for a novel user experience.....
Sogou Input Method 20.0 launches with full AI integration, evolving from a tool to a smart assistant. It upgrades core interactions via large models, enhancing accuracy, speed, and intelligence in voice, typing, and translation.....
JD Technology and Rokid have jointly launched JoyGlance, the world's first smart glasses shopping application. It combines AI large models with waveguide technology to enable voice-activated shopping and simplify user operations.
Bangalore-based voice AI startup Arrowhead raised $3M in seed funding led by Stellaris Venture Partners, with participation from angel investors including CRED founder Kunal Shah. Funds will be used to enhance AI models, expand the team, and grow in the financial services market.....
Voiceley can perform AI voice cloning quickly and for free, and also generate voices using voice models.
Provides ASR, TTS, and LLM models for voice AI, which can be tested and deployed for real-time applications.
Developers can interactively experience the new voice models gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-tts in the OpenAI API.
Scalable streaming voice synthesis technology powered by large language models.
Google
$0.49
Input tokens/M
$2.1
Output tokens/M
1k
Context Length
Openai
$2.8
$11.2
Xai
$1.4
$3.5
2k
$7.7
$30.8
200
-
Anthropic
$105
$525
$0.7
$7
$35
$17.5
$21
Alibaba
$2
$20
$4
$16
Baidu
128
$6
$24
256
$1
$10
unsloth
Spark-TTS is an efficient text-to-speech system based on large language models (LLM), supporting bilingual synthesis in Chinese and English with zero-shot voice cloning.
IbrahimAmin
This is an automatic speech recognition model fine-tuned based on the wav2vec2-large-xlsr-53 architecture, specifically optimized for Egyptian Arabic, Modern Standard Arabic, and Gulf/Levantine Arabic. The model is trained on multiple Arabic speech datasets and achieves a word error rate of 27.20% on the Common Voice 17.0 Arabic test set, outperforming many similar models.
maitrix-org
Voila is a large family of speech-language foundation models designed to enhance human-computer interaction, supporting real-time, low-latency voice interaction and multilingual processing.
dangvansam
VietTTS is an open-source toolkit providing powerful Vietnamese TTS models, supporting natural speech synthesis and voice cloning.
TeamSpeak MCP is a service based on the Model Context Protocol for controlling TeamSpeak servers through AI models (such as Claude), providing comprehensive channel management, user permission control, voice adjustment, and other functions.
TeamSpeak MCP is a server control tool based on the Model Context Protocol, specifically designed to allow AI models (such as Claude) to manage TeamSpeak voice servers. It provides 39 functional tools, covering all - around operations such as user management, channel control, and permission configuration. It supports multiple deployment methods (PyPI/Docker/local) to achieve automated TeamSpeak management.
An intelligent conversational robot project based on large models, supporting multi - platform access and multiple AI models, with text, voice, image processing, and plugin expansion capabilities, and can customize enterprise AI applications.